Rabin-Karp String Matching Algorithm
Interactive demonstration using rolling hash for efficient pattern search.
📝 Problem Statement
Find all occurrences of a Pattern ($P$) in a Text ($T$). Like the Naive algorithm, we allow **overlapping matches** and advance the pattern by **only 1 index** after a full or partial comparison.
Rabin-Karp introduces **hashing** to quickly filter out non-matching windows in the text, performing character-by-character comparisons only when hash values match.
Input Section
Initial Visualization:
💡 Rabin-Karp Algorithm Logic
Key Idea: Hashing Substrings
Instead of direct character comparisons for every position, Rabin-Karp calculates a **hash value** for the pattern and for each window of text of the same length as the pattern.
Algorithm Steps:
- **Choose Parameters**: Select a base ($d$) and a large prime modulus ($q$). A common choice for $d$ is the number of characters in the alphabet (e.g., 256 for ASCII).
- **Precompute Pattern Hash**: Calculate $h_P = \text{hash}(P)$.
- **Compute First Text Window Hash**: Calculate $h_{T_0} = \text{hash}(T[0 \dots M-1])$.
- **Iterate and Compare Hashes**:
- For each text window $T[i \dots i+M-1]$:
- **If $h_P == h_{T_i}$**: A potential match. Perform a **character-by-character check** (called a "spurious hit check") to confirm. If it's a true match, record index $i$.
- **If $h_P \neq h_{T_i}$**: A guaranteed mismatch. No need for character comparisons.
- For each text window $T[i \dots i+M-1]$:
- **Rolling Hash**: For the next window $T[i+1 \dots i+M]$, efficiently update $h_{T_{i+1}}$ from $h_{T_i}$ by "subtracting" the leading character $T[i]$ and "adding" the new trailing character $T[i+M]$.
$h_{new} = (d (h_{old} - T[old\_char] \cdot d^{M-1}) + T[new\_char]) \pmod q$(Where $d^{M-1}$ is precomputed as $h_d = d^{M-1} \pmod q$).
Our demo will use a base $d=10$ (treating characters as digits, 'A'=1, 'B'=2 etc.) and a prime modulus $q=13$. This simplifies visualization, though real-world implementations use larger values for better collision avoidance.
🎬 Step-by-Step Demo
Visualization
Output Log
Final Match Indices: None
📊 Algorithm Analysis
Best & Average Case Complexity
O(N + M)
When there are few or no hash collisions (spurious hits). Hash computation for $P$ is $O(M)$. Initial window hash is $O(M)$. Rolling hash for $N-M$ windows is $O(N-M)$. Confirmatory character checks are rare. Total: $O(M) + O(N-M) = O(N+M)$.
Worst Case Complexity
O(N * M)
Occurs when every window of text produces a hash collision with the pattern's hash (many spurious hits), forcing a full character-by-character comparison for almost every shift. E.g., $T=\text{AAAAAAAAAAAA}$, $P=\text{AAAA}$.
Efficiency & Hash Collisions
The efficiency of Rabin-Karp highly depends on the choice of the hash function (base $d$ and prime modulus $q$). A good hash function minimizes collisions, bringing the average case close to $O(N+M)$, which is much faster than Naive's worst-case $O(N \cdot M)$.